Skip to content

Conversation

@cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Oct 21, 2025

For avoiding to skip long line consumption,
it sometimes needs to consume until the limit of buffers. This could provide different approach of mitigation for consuming long lines.

Fixes #10435.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Optional truncation of excessively long log lines with UTF-8‑safe trimming and a new metric counting truncated occurrences.
    • New configuration flag to enable/disable truncation.
  • Bug Fixes

    • Improved resource cleanup on initialization/validation failures.
    • Mutual‑exclusion check to prevent conflicting encoding settings.
  • Tests

    • New tests verifying truncation behavior for long ASCII and UTF‑8 lines.

@coderabbitai
Copy link

coderabbitai bot commented Oct 21, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a configurable truncate_long_lines option to in_tail, implements UTF‑8‑safe truncation when enabled, registers a new truncation metric/counter, improves init cleanup and encoding validation, updates per-chunk byte accounting for partial consumption, and adds ASCII and UTF‑8 truncation tests.

Changes

Cohort / File(s) Summary
Configuration binding
plugins/in_tail/tail.c
Adds boolean config option truncate_long_lines (default false) bound to struct flb_tail_config via offsetof.
Config struct & metrics macro
plugins/in_tail/tail_config.h
Adds #define FLB_TAIL_METRIC_L_TRUNCATED 104, int truncate_long_lines; and struct cmt_counter *cmt_long_line_truncated; to struct flb_tail_config.
Init / cleanup / metrics registration
plugins/in_tail/tail_config.c
Replaces flb_free(ctx) error-paths with flb_tail_config_destroy(ctx); enforces mutual exclusivity for unicode vs generic encoding; creates and registers cmt_long_line_truncated counter and exposes it via flb_metrics_add.
Truncation logic & decoding
plugins/in_tail/tail_file.c
Adds static utf8_safe_truncate_pos() and integrates UTF‑8‑safe truncation flow when truncate_long_lines is enabled: computes safe cut, packs/truncation segment, updates bytes/offsets/skip flags, increments truncation metric, and adds dedicated cleanup label/paths. Adjusts decoded-length handling and chunk return behavior.
Tests
tests/runtime/in_tail.c
Adds helpers write_long_ascii_line, write_long_utf8_line and two tests (flb_test_in_tail_truncate_long_lines, flb_test_in_tail_truncate_long_lines_utf8) that enable truncate_long_lines and validate emitted segments.

Sequence Diagram(s)

sequenceDiagram
    participant Reader as File Reader
    participant Processor as in_tail Processor
    participant Truncator as UTF-8 Truncator
    participant Metrics as Metrics Registry
    participant Output as Output Queue

    Note right of Reader: read chunk bytes
    Reader->>Processor: provide chunk
    Processor->>Processor: decode/convert (ret)
    Processor->>Processor: compute eff_max / search newline

    alt newline found within window
        Processor->>Output: emit complete line
    else truncate_long_lines enabled and dec_len >= eff_max
        Processor->>Truncator: find safe cut position
        Truncator->>Processor: cut index
        Processor->>Output: emit truncated segment
        Processor->>Metrics: increment long_line_truncated
        Processor->>Processor: set skip_next, adjust bytes/offsets
    else truncate_long_lines disabled
        Processor->>Processor: skip/drop until newline (existing behavior)
    end

    Processor->>Reader: return processed byte count / update DB offsets
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • plugins/in_tail/tail_file.c — truncation control flow, UTF‑8 backtracking correctness, and cleanup at truncation_end.
    • Byte/accounting updates in flb_tail_file_chunk and interaction with DB offsets and last_processed_bytes.
    • Metric registration/lifetime and correct exposure via flb_metrics_add in tail_config.c.
    • New tests in tests/runtime/in_tail.c for ASCII vs UTF‑8 edge cases.

Possibly related PRs

Suggested reviewers

  • edsiper
  • koleini
  • fujimotos

🐰 I nibble bytes that wander long,
I trim with care where lines go wrong.
UTF-8 safe cuts, a metric to show,
I hop, I count, then onward they go. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: implementing long line truncation in the in_tail plugin.
Linked Issues check ✅ Passed The PR comprehensively implements long line truncation as requested in issue #10435, adding truncate_long_lines configuration, metrics tracking, UTF-8 safe truncation logic, and corresponding tests.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing long line truncation: configuration binding, metrics infrastructure, truncation logic, and unit tests remain in scope.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-implement-long-line-truncation

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d2cf71 and 80ac24c.

📒 Files selected for processing (1)
  • tests/runtime/in_tail.c (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/runtime/in_tail.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cosmo0920 cosmo0920 force-pushed the cosmo0920-implement-long-line-truncation branch from dbb5d14 to 3119662 Compare October 21, 2025 11:28
@cosmo0920 cosmo0920 marked this pull request as ready for review October 22, 2025 11:51
@edsiper
Copy link
Member

edsiper commented Nov 9, 2025

@cosmo0920 we need commit history cleanup to get this merged

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Nov 10, 2025

Yup. I cleaned up my commit history into 2 commits which are an implementation and its unit test.

@cosmo0920 cosmo0920 force-pushed the cosmo0920-implement-long-line-truncation branch from 8d2cf71 to 80ac24c Compare November 10, 2025 03:58
@edsiper edsiper merged commit e9f63e6 into master Nov 10, 2025
46 checks passed
@edsiper edsiper deleted the cosmo0920-implement-long-line-truncation branch November 10, 2025 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default for skip_long_lines should be ON -or- Long Lines should truncate

3 participants